19 research outputs found

    Empirical test of the performance of an acoustic-phonetic approach to forensic voice comparison under conditions similar to those of a real case

    Get PDF
    In a 2012 case in New South Wales, Australia, the identity of a speaker on several audio recordings was in question. Forensic voice comparison testimony was presented based on an auditory-acoustic-phonetic-spectrographic analysis. No empirical demonstration of the validity and reliability of the analytical methodology was presented. Unlike the admissibility standards in some other jurisdictions (e.g., US Federal Rule of Evidence 702 and the Daubert criteria, or England & Wales Criminal Practice Directions 19A), Australia's Unified Evidence Acts do not require demonstration of the validity and reliability of analytical methods and their implementation before testimony based upon them is presented in court. The present paper reports on empirical tests of the performance of an acoustic-phonetic-statistical forensic voice comparison system which exploited the same features as were the focus of the auditory-acoustic-phonetic-spectrographic analysis in the case, i.e., second-formant (F2) trajectories in /o/ tokens and mean fundamental frequency (f0). The tests were conducted under conditions similar to those in the case. The performance of the acoustic-phonetic-statistical system was very poor compared to that of an automatic system. © 2017 Elsevier B.V

    Formant trajectories in forensic speaker recognition

    Get PDF
    Die vorliegende Arbeit untersucht das Leistungsverhalten eines Ansatzes der forensischen Sprechererkennung, der auf parametrischen Repräsentationen von Formantverläufen basiert. Quadratische und kubische Polynomfunktionen werden dabei an Formantverläufe von Diphthongen angenähert. Die resultierenden Koeffizienten sowie die ersten drei bzw. vier Komponenten der Diskreten Kosinustransformation (DCT) werden in Folge verwendet, um die dynamischen Eigenschaften der zugrundeliegenden akustischen Merkmale der Sprache und damit der Sprechercharakteristika zu erfassen. Am Ende steht eine Repräsentation bestehend aus wenigen dekorrelierten Parametern, die für die forensische Sprechererkennung verwendet werden. Die in der Untersuchung durchgeführte Evaluierung beinhaltet die Berechnung von Likelihood-Ratio-Werten für die Anwendung im Bayesschen Ansatz für die Bewertung von forensischen Beweisstücken. Die Vorteile dieses Systems und die derzeitigen Beschränkungen werden behandelt. Für die Berechnung der Likelihood-Ratio-Werte wird eine von Aitken & Lucy (2004) entwickelte multivariate Kernel-Density-Formel verwendet, die sowohl Zwischen-Sprecher- als auch Inner-Sprecher-Variabilität berücksichtigt. Automatische Kalibrierungs- und Fusionstechniken, wie sie in Systemen zur automatischen Sprecheridentifikation verwendet werden, werden auf die Ergebniswerte angewendet. Um die Bedeutung von Längenaspekten von Diphthongen für die forensische Sprechererkennung näher zu untersuchen wird ein Experiment durchgeführt, in dem der Effekt von Zeitnormalisierung sowie die Modellierung der Dauer durch einen expliziten Parameter evaluiert werden. Die Leistungsfähigkeit der parametrischen Repräsentationen verglichen mit anderen Methoden sowie die Effekte der Kalibrierung und Fusion werden unter Verwendung üblicher Bewertungswerkzeuge wie des Erkennungsfehlerabwägungs-(DET)-Diagramms, des Tippett-Diagramms und des angewandten Fehlerwahrscheinlichkeits-(APE)-Diagramms, sowie numerischer Kennziffern wie der Gleichfehlerrate (EER) und der Cllr-Metrik evaluiert.The present work investigates the performance of an approach for forensic speaker recognition that is based on parametric representations of formant trajectories. Quadratic and cubic polynomial functions are fitted to formant contours of diphthongs. The resulting coefficients as well as the first three to four components derived from discrete cosine transform (DCT) are used in order to capture the dynamic properties of the underlying speech acoustics, and thus of the speaker characteristics. This results in a representation based on only a small number of decorrelated parameters that are in turn used for forensic speaker recognition. The evaluation conducted in the study incorporates the calculation of likelihood ratios for use in the Bayesian approach of evidence evaluation. The advantages of this framework and its current limitations are discussed. For the calculation of the likelihood ratios a multivariate kernel density formula developed by Aitken & Lucy (2004) is used which takes both between-speaker and within-speaker variability into account. Automatic calibration and fusion techniques as they are used in automatic speaker identification systems are applied to the resulting scores. To further investigate the importance of duration aspects of the diphthongs for speaker recognition an experiment is undertaken that evaluates the effect of time-normalisation as well as modelling segment durations using an explicit parameter. The performance of the parametric representation approach compared with other methods as well as the effects of calibration and fusion are evaluated using standard evaluation tools like the detection error trade-off (DET) plots, the applied probability of error (APE) plot, the Tippett plot as well as numerical indices like the EER and the Cllr metric

    Introduction to forensic voice comparison

    Get PDF
    This chapter provides a brief introduction to forensic voice comparison. It describes different approaches that have been used to extract information from voice recordings: auditory, spectrographic, acoustic-phonetic, and automatic approaches. It also describes different frameworks that have been used to draw inferences from such information: likelihood-ratio, posterior-probability, identification/exclusion/inconclusive, and the UK framework. In addition, the chapter describes empirical validation of forensic voice comparison systems and briefly discusses legal admissibility

    Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01) – Conclusion

    Get PDF
    This conclusion to the virtual special issue (VSI) “Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01)” provides a brief summary of the papers included in the VSI, observations based on the results, and reflections on the aims and process. It also includes errata and acknowledgments

    Score based procedures for the calculation of forensic likelihood ratios - scores should take account of both similarity and typicality

    Get PDF
    Score based procedures for the calculation of forensic likelihood ratios are popular across different branches of forensic science. They have two stages, first a function or model which takes measured features from known-source and questioned-source pairs as input and calculates scores as output, then a subsequent model which converts scores to likelihood ratios. We demonstrate that scores which are purely measures of similarity are not appropriate for calculating forensically interpretable likelihood ratios. In addition to taking account of similarity between the questioned-origin specimen and the known-origin sample, scores must also take account of the typicality of the questioned-origin specimen with respect to a sample of the relevant population specified by the defence hypothesis. We use Monte Carlo simulations to compare the output of three score based procedures with reference likelihood ratio values calculated directly from the fully specified Monte Carlo distributions. The three types of scores compared are: 1. non-anchored similarity-only scores; 2. non-anchored similarity and typicality scores; and 3. known-source anchored same-origin scores and questioned-source anchored different-origin scores. We also make a comparison with the performance of a procedure using a dichotomous “match”/“non-match” similarity score, and compare the performance of 1 and 2 on real data

    Validations of an alpha version of the E3 Forensic Speech Science System (E3FS3) core software tools

    Get PDF
    This paper reports on validations of an alpha version of the E3 Forensic Speech Science System (E3FS3) core software tools. This is an open-code human-supervised-automatic forensic-voice-comparison system based on x-vectors extracted using a type of Deep Neural Network (DNN) known as a Residual Network (ResNet). A benchmark validation was conducted using training and test data (forensic_eval_01) that have previously been used to assess the performance of multiple other forensic-voice-comparison systems. Performance equalled that of the best-performing system with previously published results for the forensic_eval_01 test set. The system was then validated using two different populations (male speakers of Australian English and female speakers of Australian English) under conditions reflecting those of a particular case to which it was to be applied. The conditions included three different sets of codecs applied to the questioned-speaker recordings (two mismatched with the set of codecs applied to the known-speaker recordings), and multiple different durations of questioned-speaker recordings. Validations were conducted and reported in accordance with the “Consensus on validation of forensic voice comparison”

    A strawman with machine learning for a brain: A response to Biedermann (2022) the strange persistence of (source) “identification” claims in forensic literature

    Get PDF
    We agree wholeheartedly with Biedermann (2022) FSI Synergy article 100222 in its criticism of research publications that treat forensic inference in source attribution as an “identification” or “individualization” task. We disagree, however, with its criticism of the use of machine learning for forensic inference. The argument it makes is a strawman argument. There is a growing body of literature on the calculation of well-calibrated likelihood ratios using machine-learning methods and relevant data, and on the validation under casework conditions of such machine-learning-based systems

    Speaker Verification using Pole/Zero Estimates of Nasals

    No full text
    The acoustics of nasals are an important source of speakerdiscriminating features. Nasal spectra contain poles and zeros dependent upon nasal cavities which are complex static structures which vary from person to person. Nasal spectra may therefore have low withinspeaker and high between-speaker variability. This study applies a recent pole-zero model estimation technique based on a logarithmic criterion on nasal spectra to obtain pole/zero features for speaker verification. The robustness against two mismatch conditions, Lombard speech and studio versus GSM transmission channel, is evaluated and compared with an approach based on MFCC features. Furthermore, results of fusion of the nasal systems with a generic MFCC-based GMM-UBM speaker verification system are presented

    Forensic speech science

    No full text
    As part of the Expert Evidence series this chapter is intended to be accessible to lawyers, judges, police officers, and potential jury members; however, it is hoped that this chapter will also be of interest to forensic scientists, phoneticians / speech scientists, speech-processing engineers, and students of all these disciplines. It introduces forensic speech science in a relatively non-technical way, assuming a reader who has no prior knowledge of the subject. The 2010 edition had a heavy focus on acoustic-phonetic statistical approaches to forensic voice comparison. The 2018 edition has a heavier focus on automatic approaches. The 2018 edition includes examples of forensic voice comparison based on real cases. The 2018 edition also includes expended coverage of other branches of forensic speech science. Topics covered include: The likelihood ratio framework for the evaluation of forensic evidence. Approaches to forensic voice comparison. Assessing the validity and reliability of forensic-comparison systems. Speaker identification by laypeople. Disputed utterance analysis
    corecore